Tradeoffs and Guidelines for Selecting Technologies to Generate Web Content from Relational Data Stores

نویسنده

  • Frank Sigvald Haug
چکیده

In order to evaluate approaches for offering relational database classes on the web, we recognized the need for extracting web page content from relational database management systems. "Content" in this context refers to dynamic XML data that is reformatted into HTML pages. This paper describes the issues, advantages, and limitations of two approaches for extracting web content dynamically. It is based on the findings of a research project that focused on implementing a small, representative set of transformations from a common relational database. The study focused on deployment of information (not creation or modification). The first approach relied on the vendor-supplied support for SQL/XML in Microsoft SQL Server 2000. The second used a custom servlet written in Java. Emphasis was placed on identifying guidelines and tradeoffs between vendor-supplied, minimal scripting solutions and custom developed programmatic solutions. The Research Project Description The Web introduced the end-user to a whole new world of possibilities with respect to how information is accessed and stored. Increases in communication bandwidth, and computer processing power, memory and storage have made remote access to information practical and reliable. Acceptance of standards such as HTML [1] encouraged the separation of information formatting and content. Other standards, such as XML [2], have further formalized this separation, by allowing authors to separate semantic information from data and also provide a uniform mechanism for validating both semantic and syntactic content. While these standards are useful for documents created by hand, they are much more powerful when used to create content dynamically from a live system. Here, additional standards such as XSL [3], XPath [4], and XDR [5] provide a framework for transforming and selecting content based on semantic information. There is a large selection of facilities implementing these standards to choose from when designing a dynamic content web system. These facilities make it possible, and practical to dynamically generate different up-to-date views of information interactively or by batch processes at predefined intervals after the content has changed. Choosing the right alternatives from the available facilities requires careful analysis and planning. Limited Project Scope The project focused on dynamic retrieval and encoding of information. This avoided standard database issues such as concurrency, locking, etc. and allowed the project to focus on the issues related to the specific retrieval and encoding implementations rather than the issues involved in traditional database applications. The project focused on two approaches: a vendor-supplied approach using Microsoft's MS SQL XML facility in SQL Server 2000, and a custom-Servlet approach written in Java using the current JDK (1.3), JDBC (2.0), JSDK (1.1) and deployed using the Apache Tomcat servlet container from the Jakarta project. Common Components (Used by Both Approaches) While the project evaluated two different approaches, the facilities implementing standard and platform services were shared between the two approaches as much as possible. In particular, the same operating system, web server, and relational database services were used. Both approaches ran on the same machines at the same time, using the same tables, database, and Microsoft SQL Server 2000 RDBMS. Both used Microsoft's Internet Information Server (IIS) version 4.0, running on Windows NT Server 4.0 Service Pack 6a for the web and platform services, and both used Microsoft's Internet Explorer 5.5 as the client. The latest version of Microsoft's XML/XSL parsing service was used for the client (however, due to vendor requirements, an older version of this was also used in parallel with the latest version on the web server for the vendor supplied approach). Common Functionality Because the MS SQL XML facility is a commercial quality application, it supports more features and facilities than could be implemented by the second approach within the limits of the research project. However, a core set of functionality was selected to be implemented to enable comparisons and to act as a baseline for any future facility implementations or feature enhancements. Both facilities respond to requests for content that is dynamically generated from database queries. These queries can optionally include parameters that are specified per execution. The queries and parameters can be specified in files stored on the server (template and properties files) or passed as part of the http URL request. (For example selecting values based on an HTML form input field, such as customer id, PIN, etc.). Both approaches have essentially the same limits with respect to query/result size and complexity. Encoding results for SQL queries can be done in many ways, and the MS SQL XML facility has several different options available. These options are actually implemented in the SQL Server database, and therefore are also available for the custom servlet approach. This would not be useful for comparison purposes however, since the MS SQL XML support is included as part of SQL Server 2000. In other words, the Java approach was implemented using SQL Server 2000 to simplify comparisons, but in practice was intended to demonstrate an alternative that was not dependent on using MS SQL Server 2000. (Why re-implement a facility if it is already there?) Results for the custom servlet were encoded by creating one XML element per row, with column names stored as attribute names and column values stored as attribute values within that element (equivalent to one of the supported formats from MS SQL XML). For example: Architectural Overview Both approaches share a common architectural structure (See Figure 1). The components can be grouped into one of three areas: Client, Server, and Web Site (within Server). The Client Area refers to the components that exist or are delivered to the computer running the client (i.e. the web browser). The Server Area refers to the components that exist on the server (as opposed to the client). Lastly, the Web Site Area refers to the components that exist on the Server side but are part of a normal web site (as opposed to the components that are specific to one approach). Server Web Site Internet Information Server (IIS) STYLE SHEET .XSL HTML PAGES XML PAGES SQL Server Internet Explorer 5.5 Template and Configuration Files Dynamic Content (XML/HTML) XML PAGES HTML PAGES STYLE SHEET .XSL Client Dynamic Content Generator Figure 1 Common Architecture overview Two of the areas are essentially unchanged for both approaches, namely the Client Area and Web Site Area (see Figure 1). The Client Area simply contains Internet Explorer, and the content that is ultimately delivered to it. The Web Site Area contains the web server software itself (IIS) and the normal files that make up a web site (static HTML, XML, XSL, CSS, etc.). While the configuration and content of the files in the web site will be different for different approaches, the arrangement and types of components is common to both. The Server Area includes the database, the implementation of each specific approach, any files necessary to configure or direct the given approach, and the actual content generated by the approach (see Figure 1). Generically speaking, the client requests content from the web site. The web site determines if the request refers to its "normal" files and if so, it delivers the appropriate content. If the request is for dynamic content, the web site sends the appropriate request to the appropriate components in the Server Area. The Dynamic Content Generator in Figure 1 is a generic placeholder for the specific components that are responsible for processing this request for a specific approach. These components use various files and database requests to generate and format the dynamic content, which is then returned to the web site and ultimately to the client. MS SQL XML Architecture Figure 2 shows the architecture specific to the vendor-supplied approach. Here the MS SQL XML component (a set of DLLs supplied by Microsoft) takes the place of the Dynamic Content Generator from Figure 1. It is implemented as an ISAPI redirector DLL, which means that it can be plugged directly into IIS and IIS can then forward requests to it and receive responses from it. Server Web Site Internet Information Server (IIS) STYLE SHEET .XSL HTML PAGES XML PAGES SQL Server

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Web-based Information for Medical Tourism: Case Study of AriaMedTour Medical Tourism Company, Iran

Objective: As one of the well-known countries for medical tourism, Iran has the potential for growth in this industry and requires information and advertisements in online media and websites. This study aims to investigate the effectiveness of the content produced by the website of AriaMedTour Medical Tourism Company in informing tourists. Methods: This is an applied study that adopted an indu...

متن کامل

Emerging Technologies For Big Data Processing: NOSQL And NEWSQL Data Stores

In this incessant science and technological era, where advances in web technology and the production of mobile devices and sensors connected to the Internet are resulting to voluminous amount of structured, semi-structured and unstructured data, called Big Data, the demand for technologies with extensive processing and storage requirements is rising to persuasively process such data i.e. Big Da...

متن کامل

Dynamic Generation of Adaptive Web

This paper describes the techniques used to dynamically generate personalized Web catalog pages in a prototype toolkit for the creation of adaptive Web stores. We focus on the integration of personaliza-tion strategies for selecting the layout and content of the catalog pages, with Natural Language Generation techniques, used to dynamically produce the descriptions of products, tailored to the ...

متن کامل

Analyzing new features of infected web content in detection of malicious web pages

Recent improvements in web standards and technologies enable the attackers to hide and obfuscate infectious codes with new methods and thus escaping the security filters. In this paper, we study the application of machine learning techniques in detecting malicious web pages. In order to detect malicious web pages, we propose and analyze a novel set of features including HTML, JavaScript (jQuery...

متن کامل

Adaptive Information Analysis in Higher Education Institutes

Information integration plays an important role in academic environments since it provides a comprehensive view of education data and enables mangers to analyze and evaluate the effectiveness of education processes. However, the problem in the traditional information integration is the lack of personalization due to weak information resource or unavailability of analysis functionality. In this ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001